In the TCGA cohort we found 11 variants of interest, of which 9 variants can be validated by RNASeq data.

The samples of interest are those where the variant has been called either in DNA or RNA, and have RNASeq data available for validation. The splice junction search is made based on the STAR-SJCounts 1-based intronic positions.

Splicing alterations to be evaluated:

1 NRAS chr1,114716123,C,T

Variant found in 5 patients of the TCGA (5 samples).

  • Patients with NRAS chr1,114716123,C,T variant: 5 patients (5 samples)
  • Patients with the variant and RNASeq for validation: 4 patients (4 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 4bp from the variant, chr1:114716126, found in the splice junction collection, but not found in the mutated samples.
  • Exon 2 Skipping: chr1:114713979-114716657, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"NRAS_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="NRAS" & found_variants$MutationKey_Hg38 == "chr1,114716123,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

1.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

1.1.1 Donor Gain

Search: predicted at 4 bp from the variant

Show all the splice junctions containing the position 114716126

colnames(GeneSJ)[grep("114716126",colnames(GeneSJ))]
## [1] "chr1_114709722_114716126"

Found: chr1_114709722_114716126

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114709722_114716126
##   [1] 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0

Samples with the SJ of interest:

table(GeneSJ$chr1_114709722_114716126>0) 
## 
## FALSE  TRUE 
##   149     1

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr1_114709722_114716126 > 0])
## 
## WT 
##  1

Alternative SJ not found in the mutated samples.

1.1.2 Exon Skipping 2

Search: chr1:114713979-114716657

Show all the splice junctions containing the position 114713979

colnames(GeneSJ)[grep("114713979",colnames(GeneSJ))]
## [1] "chr1_114713979_114716049" "chr1_114713979_114716657"

Found: chr1:114713979-114716657

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114713979_114716657
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 7 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0
## [149] 0 0

Samples with the SJ of interest:

table(GeneSJ$chr1_114713979_114716657>0) 
## 
## FALSE  TRUE 
##   139    11

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr1_114713979_114716657 > 0])
## 
## MUT  WT 
##   1  10

Alternative SJ found in the mutated samples.

1.1.3 Canonical SJ

Exon1-2: chr1:114716178-114716657

Exon2-3: chr1:114713979-114716049

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114716178_114716657
##   [1] 16 28 33 19 19 20 15  3 10 25 39 15 25 13 19 40 18 16 11  9 27 21 22  3 23
##  [26]  5 13 18 23 10 18 12 22  9 24 55 13 13 23 17 27  6  6  8 11  8 23 29 25 17
##  [51]  5 10  9  9 25 10 21 20 30 22 10 13 13  9 12  2 19 15  1 16 13 17 25 15 26
##  [76] 26 18 15 15 12  4 10 32 17 12 17 33 18 14 12 31 24 15 30 21 22 28  9 15 51
## [101] 20 11 27  4 19 31 28 23 16 24 10 12 14 21 16 22 35 17  1 15 20  8  9  6 33
## [126] 14 20 11 12  9 17 12 16 31 31 16 15 10 14  7  9 18 43 29 17  8 31 19  0 54

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114713979_114716049
##   [1]  67  55  99  98  44  89  41  46  57  30 110  77 109  19  51  89  86  85
##  [19]  66  27  69  52  96  58  82  24  29  78  76  48  85  85  68  56 104 129
##  [37]  31  73  62  42  88  28  22  11  72  61  70 137  26  52  31  86  27  77
##  [55]  93  74  38  46  73 101  50  36  50  27  61  19 102  43  11  77  85  89
##  [73] 102  60  74  54  94  45  78  59  30  58 127  75  41  53  87  45  65  42
##  [91] 121  86  48  81  93 146 127  54  49  76  78  25 114  42  73  74  67  74
## [109]  53  96  62  38  44 112  37  87  60  60   0  62  79  22  80  31 153  78
## [127]  82  42  41  37  43  77  65 114  73  75  32  66  94  39  40  85  66  93
## [145]  72  38  73  99  10 124

1.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx1_2 <- (GeneSJ$chr1_114716178_114716657)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx2_3 <- (GeneSJ$chr1_114713979_114716049)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES2 <- (GeneSJ$chr1_114713979_114716657)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

1.3 VAF

Mutated samples vaf:

1.4 Plots

1.4.1 Static Dot Plots

Canonical SJ:

Splicing alterations:

1.4.2 Interactive Dot Plots

Canonical SJ:

Splicing alteration:

1.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

1.5 Statistical Analysis

SJCounts <- GeneSJ #TCGA.NRAS.chr1-114716123-C-T.xlsx

1.5.1 Exon Skipping

1.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 0.22159, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.0287364

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.0000000 0.0000000 0.0000000 0.2040816

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES2 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.0287364 -0.0287364 -0.0287364  0.1753452

1.5.1.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:11] = -0.028736, 0.11985, 0.12949,  ..., 0.58476,  1.258
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9315068 0.9315068 0.9315068 0.9520548
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES2")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr1:114713979-114716657"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

1.5.1.3 Mann Whitney

Normality:

shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP== "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 0.22159, p-value < 2.2e-16
shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP== "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"]
## W = 0.62978, p-value = 0.001241

Mann Whitney:

wt <- wilcox.test(x=SJCounts$Normalized_ES2[SJCounts$GROUP== "MUT"], 
                  y=SJCounts$Normalized_ES2[SJCounts$GROUP== "WT"],
                  alternative = "two.sided", 
                  paired = FALSE, 
                  conf.int = 0.95)

wt
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"] and SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 343, p-value = 0.1924
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -4.015703e-05  2.965998e-05
## sample estimates:
## difference in location 
##           1.138317e-05

2 KRAS chr12,25245347,C,T

Variant found in 2 patients of the TCGA (2 samples).

  • Patients with KRAS chr12,25245347,C,T variant: 2 patients (2 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 1bp from the variant, chr12:25245345, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KRAS_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KRAS" & found_variants$MutationKey_Hg38 == "chr12,25245347,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

2.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

2.1.1 Donor Gain

Search: predicted a 1bp from the variant, chr12:25245345

Show all the splice junctions containing the positions between 25245340 - 25245349

colnames(GeneSJ)[grep("2524534",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

2.2 VAF

Mutated samples vaf:

3 KMT2D chr12,49022063,G,A

Variant found in 1 patient of the TCGA (1 sample).

  • Patients with KMT2D chr12,49022063,G,A variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant, chr12:49022064, not found in the splice junction collection.
  • Donor Gain: chr12:49022070, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KMT2D_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KMT2D" & found_variants$MutationKey_Hg38 == "chr12,49022063,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

3.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

3.1.1 Donor Gain

Search: predicted at 49022065

Show all the splice junctions containing the positions from 49022060 to 49022069

colnames(GeneSJ)[grepl("4902206", colnames(GeneSJ))] 
## character(0)

Alternative SJ not found in the splice junction collection.

3.1.2 Donor Gain

Show all the splice junctions containing the positions from 49022070 to 49022079

colnames(GeneSJ)[grepl("4902207", colnames(GeneSJ))] 
## character(0)

Alternative SJ not found in the splice junction collection.

3.2 VAF

Mutated samples vaf:

4 FLT3 chr13,28018485,G,T

Variant found in 2 patients of the TCGA (2 samples).

  • Patients with FLT3 chr13,28018485,G,T variant: 2 patients (2 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Donor Loss: chr13:28015702-28018466; donor splice site chr13:28018466.
  • Exon Skipping 20: chr13:28015702-28023349, found in mutated samples.
  • Exon Skipping 19: chr13:28018590-28024860, not found in mutated samples.
  • Exon Skipping 19 + 20: chr13:28015702-28024860, found in mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"FLT3_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="FLT3" & found_variants$MutationKey_Hg38 == "chr13,28018485,G,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

4.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

4.1.1 Exon Skipping 20

Search: chr13:28015702-28023349

Show all the splice junctions containing the position 28015702_28023349

colnames(GeneSJ)[grep("28015702_28023349",colnames(GeneSJ))]
## [1] "chr13_28015702_28023349"

Found: chr13:28015702-28023349

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28015702_28023349
##   [1]  2 12  0 42 23 11  3  0 59  0  0 16  0  0 11  0 13 18  0  1 15  2  9 16  8
##  [26]  0 10 10 35 30  4  8  5  8  9  0 11  4 22 16  9  2 16  3  9  5  2  7  1 10
##  [51]  9  5 17 20  5  0 10  3 13 39 18 55  2  4  4 16  8 14 10 14  2 12 11  2  9
##  [76]  3 12 29  4  8  3  8  5 15 22  2  5  0  0  2 13  9  6 33 19  0  5  0 14 25
## [101]  0  4 12  5  6 11 16 17 20  3  0 29  6  4  0 26 10  3  3  7  0  5 16  6 23
## [126]  2 11  0 24 12 17 12  2  4 13 13 17  7 11 27 12  9  3 17 12  0  5  0  6  0

Samples with the SJ of interest:

table(GeneSJ$chr13_28015702_28023349>0) 
## 
## FALSE  TRUE 
##    23   127

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr13_28015702_28023349 > 0])
## 
## MUT  WT 
##   2 125

Alternative SJ found in the mutated samples.

4.1.2 Exon Skipping 19

Search: chr13:28018590-28024860

Show all the splice junctions containing the positions 28018590-28024860

colnames(GeneSJ)[grep("28018590_28024860",colnames(GeneSJ))]
## [1] "chr13_28018590_28024860"

Found: chr13:28018590-28024860

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28018590_28024860
##   [1]  0  1  0  0  0  3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##  [26]  0  0  2  0  1  0  0  0  0  0  0  0  2  1  0  0  0  3  0  0  0  0  0  1  0
##  [51]  0  1  3  2  1  0  0  6  0  0  3 15  0  0  1  0  0  0  0  4  0  2  1  0  0
##  [76]  1  0  0  0  0  0  3  0  0  0  1  0  0  0  0  5  2  1  1  0  0  1  0  2  0
## [101]  0  0  0  0  2  0  0  2  0  0  0  1  0  5  0  1  0  0  0  0  0  0  0  2  0
## [126]  0  0  1  1  0  0  0  2  0  1  2  2  0  0  1  0  0  0  0  0  0  0  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr13_28018590_28024860>0) 
## 
## FALSE  TRUE 
##   109    41

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr13_28018590_28024860 > 0])
## 
## WT 
## 41

Alternative SJ found in the mutated samples.

4.1.3 Exon Skipping 19+20

Search: chr13:28015702-28024860

Show all the splice junctions containing the positions 28015702-28024860

colnames(GeneSJ)[grep("28015702_28024860",colnames(GeneSJ))]
## [1] "chr13_28015702_28024860"

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28015702_28024860
##   [1]  0  1  0  3  0  0  0  0  7  0  0  0  0  0  0  0  0  1  0  0  2  0  0  1  0
##  [26]  0  0  0  3  1  0  0  0  0  0  0  0  0  5  1  2  0  0  0  0  0  0  0  0  0
##  [51]  2  0  1  0  0  0  1  0  0  4  0 19  0  0  0  8  0  0  6  1  0  0  0  0  0
##  [76]  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1
## [101]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 10  0  0  0  0  0  0  0  0  0
## [126]  0  0  0  0  0  0  2  0  0  0  2  5  0  0  1  0  0  0  0  0  0  0  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr13_28015702_28024860 >0) 
## 
## FALSE  TRUE 
##   124    26

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr13_28015702_28024860 > 0])
## 
## MUT  WT 
##   1  25

Alternative SJ found in the mutated samples.

4.1.4 Canonical SJ

Exon 19-20 chr13:28018590-28023349

Exon 20-21 chr13:28015702-28018466; splice site chr13:28018466

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28018590_28023349
##   [1] 249 515  30 590 516 602  66   9 466 125  64 282 229  30 328  89 254 370
##  [19] 131 207 569 143 286 248 238 131  82 112 692 491  75 269 162 338 397  11
##  [37] 268 239 451 751 302  48 306  59 233 171  73 434 225 347 293 166 494 548
##  [55] 186  47 134 275 527 544 335 415  66  66 315 197 307 347 209 239 143 195
##  [73] 337  70 341 194 449 349  80 134 105 461 359 360 487 342 163  53  47  22
##  [91] 447 200 121 870 399  26 206  35 316 847  31  74 374 350 274 263 636 443
## [109] 314 575  91 575 227 279 109 681 419 164 100 224  93 187 429 192 912  99
## [127] 185 173 223 498 202 678 170 257 242 188 220 345 113 194 285 273 248 518
## [145] 335  65 144 102 271   3

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28015702_28018466
##   [1]  248  498   29  616  468  622  108   14  539  164   71  291  235   33  303
##  [16]   99  270  395  126  266  573  156  276  338  263  181  116  142  660  510
##  [31]   78  283  199  299  405    6  331  263  453  772  333   48  336   49  224
##  [46]  202   62  375  235  302  324  185  508  536  177   54  160  365  501  528
##  [61]  314  476  106   73  298  312  315  422  199  182  114  204  310   70  428
##  [76]  214  482  507   96  201   95  471  342  386  541  360  164   54   64   22
##  [91]  501  241  123  844  374   33  244   52  347  838   36   69  407  330  296
## [106]  322  589  441  319  567   83  592  253  313  116  691  428  171  181  241
## [121]  112  244  439  212 1000   92  190  191  216  499  217  710  168  270  256
## [136]  251  226  349  103  241  332  270  251  557  332   74  133  121  331    2

4.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx19_20 <- (GeneSJ$chr13_28018590_28023349)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx20_21 <- (GeneSJ$chr13_28015702_28018466)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_SE20 <- (GeneSJ$chr13_28015702_28023349)/GeneSJ$rowSum_SJtotal*100
GeneSJ$INCLUSION_Ex20 <- GeneSJ$chr13_28018590_28023349 + GeneSJ$chr13_28015702_28018466
GeneSJ$PSI_SE20 <- (GeneSJ$INCLUSION_Ex20)/(GeneSJ$chr13_28015702_28023349+GeneSJ$INCLUSION_Ex20)

GeneSJ$Normalized_SE1920 <- (GeneSJ$chr13_28015702_28024860)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

4.3 VAF

Mutated samples vaf:

4.4 Plots

4.4.1 Static Dot Plots

Canonical splice junction: Exon 20-21 chr13:28015702-28018466; donor splice site chr13:28018466

Splicing alterations:

4.4.2 Interactive Dot Plots

Canonical splice junction: Exon 20-21 chr13:28015702-28018466; donor splice site chr13:28018466

Splicing alterations:

4.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

4.5 Statistical Analysis

SJCounts <- GeneSJ

4.5.1 Donor Loss

4.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"]
## W = 0.87793, p-value = 1.08e-09

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 4.634878

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.701734 3.881225

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx20_21 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  0.06685609 -0.75365251

4.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:148] = -1.9682, -1.4945, -1.4716,  ..., 3.2724, 4.9872
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6216216 0.1689189
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.621621621621622" "MUT Percentile: 0.168918918918919"
print(paste0("Inferred Pvalue: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.621621621621622" "Inferred Pvalue: 0.168918918918919"
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx20_21")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <-  MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr13:28015702-28018466"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

4.5.2 Exon Skipping 20

4.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"]
## W = 0.85842, p-value = 1.298e-10

Value of Mean Normalized Expression of the SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1460614

Normalized Expression Value of the SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_SE20[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.2938584 0.1402852

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_SE20 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  0.147796993 -0.005776121

4.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:126] = -0.14606, -0.12376, -0.11948,  ..., 0.37583, 0.73747
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9121622 0.5675676
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.912162162162162" "MUT Percentile: 0.567567567567568"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.08783784 0.43243243
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.0878378378378378" "Inferred Pvalue: 0.432432432432432"

Download the vaf, inferred percentiles and pvalues of the mutated samples:

4.5.3 Exon Skipping 19 + 20

4.5.3.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"]
## W = 0.26624, p-value < 2.2e-16

Value of Mean Normalized Expression of the SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.007997

Value of Mean Normalized Expression of the SJ in WT samples:

MUT_SJi <- SJCounts$Normalized_SE1920[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.02938584 0.00000000

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_SE1920 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  0.02138884 -0.00799700

4.5.3.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:26] = -0.007997, -0.0025172, -0.00058025,  ..., 0.15099, 0.29722
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9256757 0.8310811
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.925675675675676" "MUT Percentile: 0.831081081081081"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.07432432 0.16891892
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.0743243243243243" "Inferred Pvalue: 0.168918918918919"
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9256757 0.8310811
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_SE1920")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr13:28015702-28024860"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

5 IDH1

Three (3) variants: chr2,208248389,G,A, chr2,208248389,G,T & chr2,208248389,G,C

Variants found in 16 patients of the TCGA (16 samples)

  • Patients with IDH1 chr2,208248389,G,A variant: 14 patients (14 samples)
  • Patients with IDH1 chr2,208248389,G,T variant: 1 patient (1 sample)
  • Patients with IDH1 chr2,208248389,G,C variant: 1 patient (1 sample)

Patients with the variant and RNASeq for validation: 11 patients (11 samples)

  • Patients with IDH1 chr2,208248389,G,A variant and RNASeq for validation: 9 patients (9 samples)
  • Patients with IDH1 chr2,208248389,G,T variant and RNASeq for validation: 1 patients (1 sample)
  • Patients with IDH1 chr2,208248389,G,C variant and RNASeq for validation: 1 patients (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: chr2:208245425-208248422, found in the mutated samples.
  • Exon Skipping 4: chr2:208245425-208251429, not found in the splice junction collection.
  • Donor Loss: chr2:208245425-208248368; donor splice site chr2:208248368.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"IDH1_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="IDH1" & found_variants$MutationKey_Hg38 %in% c( "chr2,208248389,G,A","chr2,208248389,G,T" ,"chr2,208248389,G,C"),]
R132C <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,A" & samples_df$Validable =="Validable"] #n=9 G>A
R132G <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,C" & samples_df$Validable =="Validable"] #n=1 G>C
R132S  <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,T" & samples_df$Validable =="Validable"] #n=1 G>T

cases <- append(R132C, R132G)
cases <- append(cases, R132S)

GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% R132C, "MUT", 
                                ifelse(GeneSJ$sample_id %in% R132G, "MUT",
                                      ifelse(GeneSJ$sample_id %in% R132S, "MUT", 
                                                     "WT")))


GeneSJ$Variant_status <- ifelse(GeneSJ$sample_id %in% R132C, "R132C", 
                                ifelse(GeneSJ$sample_id %in% R132G, "R132G",
                                              ifelse(GeneSJ$sample_id %in% R132S, "R132S", 
                                                     "WT")))

5.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

5.1.1 Donor Gain

Search: chr2:208245425-208248422

Show all the splice junctions containing the position 208245425-208248422

colnames(GeneSJ)[grep("208245425_208248422",colnames(GeneSJ))]
## [1] "chr2_208245425_208248422"

Found: chr2_208245425_208248422

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208245425_208248422
##   [1] 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
## [149] 0 0

Samples with the SJ of interest:

table(GeneSJ$chr2_208245425_208248422>0) 
## 
## FALSE  TRUE 
##   143     7

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr2_208245425_208248422 > 0])
## 
## MUT  WT 
##   3   4

Alternative SJ found in the mutated samples.

5.1.2 Exon Skipping 4

Search: chr2:208245425-208251429

Show all the splice junctions containing the positions 208245425-208251429

colnames(GeneSJ)[grep("208245425_208251429",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

5.1.3 Canonical SJ

Exon 3-4: chr2:208248661-208251429

Exon 4-5: chr2:208245425-208248368; splice site chr2:208248368

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208248661_208251429
##   [1] 106 188  65 275 159  47 122  94 134 100 188 180 224  33 253 181  99  92
##  [19] 169  53  83 104 111  44 273  97 128  28  79 262  74  57  62 151 150 122
##  [37]  97 152  77 488 371  60 129 184 156  81 131 166  60  70 264 196 114 100
##  [55] 245  95  62 114 326 197 213  20  68  13  78  32 111  67   7  55 302 123
##  [73] 102  34 209  77 230  64 123 124 123 130 106 193 291  80 177  43  96  95
##  [91]  82 226  93  92 130  73 369 132 167 363  53  85 488 265 159  67 433 162
## [109] 251 388 111  91  68 170  95 230 217 141  64 156 181 206 100  95 288  57
## [127] 187  72 201  68 212 185 154 142  95 121  72 113 177  55 192 351  59 177
## [145] 186 110  54  92  51 144

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208245425_208248368
##   [1]  81 171  42 260 115  30  97  72 120  66 124 147 165  13 243 146  86  58
##  [19] 152  37  71  58  77  41 234  40  99  16  45 269  80  51  64 129 146  99
##  [37]  87 149  67 480 283  43  93  91 154  63 115 116  45  53 274 172  97  92
##  [55] 208 102  53  73 240 181 194  20  55   8 104  32 117  42   6  48 253 108
##  [73] 119  37 158  38 217  62 125 112  55 122 104 160 272  54 121  27  69  82
##  [91]  60 186  80  53  90  72 312 104 134 360  25  80 430 183 124  57 401 132
## [109] 200 383 103  63  48 143  78 198 190 117  42 141 142 131  62  69 230  36
## [127] 117  72 172  54 222 186 104 141  70 132  69 102 196  33 172 347  37 174
## [145] 122 126  28 133  72 144

5.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx3_4 <- (GeneSJ$chr2_208248661_208251429)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx4_5 <- (GeneSJ$chr2_208245425_208248368)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_DGEx4 <- (GeneSJ$chr2_208245425_208248422)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

5.3 VAF

Mutated samples vaf:

5.4 Plots

5.4.1 Static Dot Plots

Canonical splice junction: Exon 4-5: chr2:208245425-208248368; donor splice site chr2:208248368

Splicing alterations:

5.4.2 Interactive Dot Plots

Canonical splice junction: Exon 4-5: chr2:208245425-208248368; donor splice site chr2:208248368

Splicing alteration:

5.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

5.5 Statistical Analysis

SJCounts <- GeneSJ

5.5.1 Donor Gain

5.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"]
## W = 0.13758, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.0033961

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_DGEx4[SJCounts$GROUP == "MUT"]
MUT_SJi
##  [1] 0.08210181 0.00000000 0.00000000 0.30581040 0.00000000 0.00000000
##  [7] 0.00000000 0.00000000 0.26954178 0.00000000 0.00000000

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_DGEx4 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts[SJCounts$GROUP == "MUT", c("sample_id", "Difference")]
##            sample_id  Difference
## 11  TCGA-AB-2901-03A  0.07870571
## 21  TCGA-AB-2822-03A -0.00339610
## 35  TCGA-AB-2863-03A -0.00339610
## 74  TCGA-AB-2867-03A  0.30241430
## 86  TCGA-AB-2919-03A -0.00339610
## 98  TCGA-AB-2992-03A -0.00339610
## 112 TCGA-AB-2990-03B -0.00339610
## 133 TCGA-AB-2821-03A -0.00339610
## 140 TCGA-AB-3011-03A  0.26614568
## 143 TCGA-AB-2928-03A -0.00339610
## 149 TCGA-AB-2977-03B -0.00339610

5.5.1.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:5] = -0.0033961, 0.03808, 0.07112, 0.13785, 0.21143
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
##  [1] 0.9856115 0.9712230 0.9712230 1.0000000 0.9712230 0.9712230 0.9712230
##  [8] 0.9712230 1.0000000 0.9712230 0.9712230
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_DGEx4")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <-NA

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr2:208245425-208248422"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

5.5.1.3 Kruskal Wallis

Kruskal Wallis:

kruskal.test(Normalized_DGEx4 ~ Variant_status, data = SJCounts)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Normalized_DGEx4 by Variant_status
## Kruskal-Wallis chi-squared = 28.911, df = 3, p-value = 2.338e-06

Pairwise comparisons:

pairwise.wilcox.test(SJCounts$Normalized_DGEx4, SJCounts$Variant_status)
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  SJCounts$Normalized_DGEx4 and SJCounts$Variant_status 
## 
##       R132C R132G R132S  
## R132G 1.00  -     -      
## R132S 0.80  1.00  -      
## WT    0.02  1.00  6.6e-07
## 
## P value adjustment method: holm

Detailed:

pkw <- pairwise_wilcox_test(SJCounts,Normalized_DGEx4 ~ Variant_status)
pkw
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 Normalized_D… R132C  R132G      9     1       5.5 8.04e-1 1   e+0 ns          
## 2 Normalized_D… R132C  R132S      9     1       1   1.99e-1 7.96e-1 ns          
## 3 Normalized_D… R132C  WT         9   139     748.  4   e-3 2   e-2 *           
## 4 Normalized_D… R132G  R132S      1     1       0   1   e+0 1   e+0 ns          
## 5 Normalized_D… R132G  WT         1   139      67.5 8.98e-1 1   e+0 ns          
## 6 Normalized_D… R132S  WT         1   139     139   1.09e-7 6.54e-7 ****

5.5.2 Donor Loss

5.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"]
## W = 0.97019, p-value = 0.003895

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 11.69597

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "MUT"]
MUT_SJi
##  [1] 10.180624 12.434326 12.618842 11.314985  8.047690 12.149533  9.183673
##  [8] 11.366120  8.894879  8.604651 14.723926

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx4_5 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
##  [1] -1.5153475  0.7383542  0.9228703 -0.3809868 -3.6482815  0.4535612
##  [7] -2.5122980 -0.3298513 -2.8010928 -3.0913203  3.0279549

5.5.2.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:137] = -6.5809, -5.8401, -5.6354,  ..., 4.0065, 4.9974
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
##  [1] 0.20143885 0.62589928 0.64748201 0.34532374 0.05035971 0.52517986
##  [7] 0.11510791 0.35251799 0.10071942 0.07194245 0.97122302
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx4_5")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction  <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr2:208245425-208248368"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

5.5.2.3 Kruskal Wallis

Kruskal Wallis:

kruskal.test(Normalized_CanonEx4_5 ~ Variant_status, data = SJCounts)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Normalized_CanonEx4_5 by Variant_status
## Kruskal-Wallis chi-squared = 4.1238, df = 3, p-value = 0.2484

Pairwise comparisons:

pairwise.wilcox.test(SJCounts$Normalized_CanonEx4_5, SJCounts$Variant_status)
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  SJCounts$Normalized_CanonEx4_5 and SJCounts$Variant_status 
## 
##       R132C R132G R132S
## R132G 1     -     -    
## R132S 1     1     -    
## WT    1     1     1    
## 
## P value adjustment method: holm

Detailed:

pkw <- pairwise_wilcox_test(SJCounts,Normalized_CanonEx4_5 ~ Variant_status)
pkw
## # A tibble: 6 × 9
##   .y.               group1 group2    n1    n2 statistic     p p.adj p.adj.signif
## * <chr>             <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
## 1 Normalized_Canon… R132C  R132G      9     1         7 0.6       1 ns          
## 2 Normalized_Canon… R132C  R132S      9     1         7 0.6       1 ns          
## 3 Normalized_Canon… R132C  WT         9   139       527 0.432     1 ns          
## 4 Normalized_Canon… R132G  R132S      1     1         1 1         1 ns          
## 5 Normalized_Canon… R132G  WT         1   139        16 0.19      1 ns          
## 6 Normalized_Canon… R132S  WT         1   139        14 0.174     1 ns

6 CBL chr11,119278165,G,C & chr11,119278164,A,T

Variant found in 2 patients of the TCGA (2 samples).

  • Patients with CBL chr11,119278165,G,C variant: 1 patient (1 sample)

  • Patients with CBL chr11,119278164,A,T variant: 1 patient (1 sample)

  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Aceptor Gain: chr11:119277845-119278189, found only in the mutated samples.
  • Aceptor Gain: chr11:119277845-119278237, found only in the mutated samples.
  • Aceptor Loss: chr11:119277845-119278165; aceptor splice site chr11:119278165.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"CBL_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="CBL" & found_variants$MutationKey_Hg38 %in% c("chr11,119278165,G,C","chr11,119278164,A,T"),]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

6.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

6.1.1 Aceptor Gain

Search: chr11:119277845-119278189

Show all the splice junctions containing the position 119277845-119278189

colnames(GeneSJ)[grep("119277845_119278189",colnames(GeneSJ))]
## [1] "chr11_119277845_119278189"

Found: chr11_119277845_119278189

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_119277845_119278189
##   [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  0  0  0  0  0  0  0  0  0  0
##  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15
## [101]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [126]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr11_119277845_119278189>0) 
## 
## FALSE  TRUE 
##   148     2

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr11_119277845_119278189 > 0])
## 
## MUT 
##   2

Alternative SJ found in the mutated samples.

6.1.2 Aceptor Gain

Search: chr11:119277845-119278237

Show all the splice junctions containing the position 119277845-119278237

colnames(GeneSJ)[grep("119277845_119278237",colnames(GeneSJ))]
## [1] "chr11_119277845_119278237"

Found: chr11_119277845_119278237

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_119277845_119278237
##   [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##  [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0
##  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 20
## [101]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [126]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr11_119277845_119278237>0) 
## 
## FALSE  TRUE 
##   148     2

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr11_119277845_119278237 > 0])
## 
## MUT 
##   2

Alternative SJ found in the mutated samples.

6.1.3 Canonical SJ

Exon 7-8: chr11:119277845-119278165; aceptor splice site: chr11:119278165

Exon 8-9: chr11:119278298-119278509

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_119277845_119278165
##   [1]  59  78  85 131  67 134  56  98  49  33 101 101 174  43  64  68  78  59
##  [19]  62  49  58 122 138  61  91   6  58  44  64  33 106  48  75  34  63  21
##  [37]  45  78  59  63  66  66  36  42  95  57  87 142  35  60  17 119  41  49
##  [55] 103  48  28  33  98 122 117  10  81  30  44  21 127  59  10  47  98 105
##  [73] 129  52  66  12  76  30  39  46  20 104  63  55  18  63  83  53  64  58
##  [91]  60  71  45 136  66  46  99  94  56  60  86  40  95 102  76  61  39  63
## [109]  74 102 133  18  44 107  73  95  88  61  14  42  64  60  86  14 167  59
## [127] 162  55  68  32  79  73 113 108  99  63  42  72 111  34  52 119  70 167
## [145]  77  36 117  91   5  19

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_119278298_119278509
##   [1]  78  87  94 136  67 175  49  93  59  51 105 119 146  12  48  85  80  85
##  [19]  64  63  46  69 162  48 109   4  74  50  70  45 106  40  85  51  87  21
##  [37]  41  89  69  83  99  76  46  41 117  47  95 126  10  56  20 133  46  42
##  [55] 112  43  25  30 103 153 129  17  70  22  75  13 123  46   8  34 131 104
##  [73] 124  53  96  37  81  28  50  73  20 139  71  52  10  76  79  74  78  61
##  [91]  66 103  62 153  70  40 133  97  67 204  58  30 109 151 103  80  45  67
## [109]  95 108 116  20  50 117  72 106  76  97  23  54  92  63  55  22 165  83
## [127] 155  73  70  51  74  74 136 128 114  49  43  94 113  32  34 132  84 184
## [145]  82  45  96  84   9  24

6.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx7_8 <- (GeneSJ$chr11_119277845_119278165)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx8_9 <- (GeneSJ$chr11_119278298_119278509)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_Ex8AG_chr11.119277845.119278189 <- (GeneSJ$chr11_119277845_119278189)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_Ex8AG_chr11.119277845.119278237 <- (GeneSJ$chr11_119277845_119278237)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

6.3 VAF

Mutated samples vaf:

6.4 Plots

6.4.1 Static Dot Plots

Canonical splice junction: Exon 7-8: chr11:119277845-119278165; aceptor splice site: chr11:119278165

Splicing alterations:

6.4.2 Interactive Dot Plots

Canonical splice junction: Exon 7-8: chr11:119277845-119278165; aceptor splice site: chr11:119278165

Splicing alteration:

6.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

6.5 Statistical Analysis

SJCounts <- GeneSJ

6.5.1 Aceptor Loss

6.5.1.1 Expression Difference

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 8.955325

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.503582 2.631579

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx7_8 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -4.451743 -6.323747

6.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:145] = -5.2402, -4.1232, -3.8271,  ..., 4.1732, 4.3337
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.006756757 0.000000000
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx7_8")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr11:119277845-119278165"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

6.5.2 Aceptor Gain

6.5.2.1 Expression Difference

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_Ex8AG_chr11.119277845.119278189[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_Ex8AG_chr11.119277845.119278189[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.5117707 0.6578947

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_Ex8AG_chr11.119277845.119278189 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.5117707 0.6578947

6.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:1] =      0
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex8AG_chr11.119277845.119278189")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr11:119277845-119278189"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

6.5.3 Aceptor Gain

6.5.3.1 Expression Difference

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_Ex8AG_chr11.119277845.119278237[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_Ex8AG_chr11.119277845.119278237[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.8188332 0.8771930

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_Ex8AG_chr11.119277845.119278237 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.8188332 0.8771930

6.5.3.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:1] =      0
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex8AG_chr11.119277845.119278237")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr11:119277845-119278237"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

7 Results Summary

Download the vaf, inferred percentiles and pvalues of all the splicing alterations evaluated: